[exporter/prometheusremotewrite] Fix WAL deadlock #37630
+108
−67
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was taking a look over #20875 and hoping to finish it.
Fixes #19363
Fixes #24399
Fixes #15277
As mentioned in #24399 (comment), I used a library to help me understand how the deadlock was happening. (1st commit). It showed that
persistToWal
was trying to acquire the lock, whilereadPrompbFromWal
held it forever.I changed the strategy here and instead of using fs.Notify, and all that complicated logic around it, we're just using a pub/sub strategy between the writer and reader Go routines.
The reader go routine, once finding an empty WAL, will now release the lock immediately and wait for a notification from the writer. While previously it would hold the lock while waiting for a write that would never happen.